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Abstract Despite the availability of very detailed data on financial market, agent-based 
modeling is hindered by the lack of information about real trader behavior. This makes 
it impossible to validate agent-based models, which are thus reverse- engineering attempts. 
This work is a contribution to the building of a set of stylized facts about the traders 
themselves. Using the client database of Swissquote Bank SA, the largest on-line Swiss 
broker, we find empirical relationships between turnover, account values and the number 
of assets in which a trader is invested. A theory based on simple mean-variance portfolio 
optimization that crucially includes variable transaction costs is able to reproduce faithfully 
the observed behaviors. We finally argue that our results bring into light the collective 
ability of a population to construct a mean-variance portfolio that takes into account the 
structure of transaction costs. 



Early results in connexion with this project have been presented at the Fribourg Symposium (Oct. 
2008, unif r . ch/econophysics/symposium), the Tokyo APFA7 Workshop (Feb. 2009, thic-apfa7.com), the 
EPFL Alliance Carrefour (Mar. 2009, alliance-tt . ch/Carref ours), and the Zurich Workshop on Complex 



Socio-Economic Systems (Jun. 2009, soms.ethz.ch/workshop2009). 
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1 Introduction 

The availability of large data sets on financial markets is one of the main reasons behind the 
number and variety of works devoted to their analysis in various fields, and especially so in 
Econophysics since physicists much prefer to deal with very large data sets. At the macro- 
scopic level, the analysis of millions of tick-by-tick data points uncovered striking regularities 
of price, volume, volatility, and order book dynamics (see [22 (3 QjJl |5] for reviews). Since 
these phenomena are caused by the behavior of individual traders, news, and the interplay 
between the two, finding a microscopic mechanism that allows agent-based models to repro- 
duce some of these stylized facts is an important endeavor meant to give us insight on the 
causes for large fluctuations, be it herding [18], competition for predictability [TB], portfolio 
optimization leading to market instability [30], or chaotic transitions [9]. 

Market phenomenology appears as a typical example of collective phenomena to the eyes 
of statistical physicists. Thus, the temptation to regard the numerous power-laws found in 
empirical works as signatures of criticality is intense. But if the former are really due to a 
phase transition, one wishes at least to know what the phases are, which is hard to guess from 
the data alone. According to early herding theoretical models [18] . the phase transition may 
lie in the density of social communication and imitation, and is of percolation type, thereby 
linking power-law distributed price and volume, criticality and agent-behavior. The standard 
Minority Game [15[ has also a single phase transition point where market predictability is 
entirely removed by the agents, without any specular effect on price and volume; on the other 
hand, grand-canonical MGs [33, 25, 1TJH2] that allow the agents not to play have a semi- line 
of critical points that do produce stylized facts of price, volume and volatility dynamics; in 
the framework of statistical physics, the phase transition is due to symmetry breaking, i.e., 
it is a transition between predictable and perfectly efficient markets; this also suggests that 
the emergence of large fluctuations is due to market efficiency. 

There are of course many other possible origins of power-laws in financial markets that have 
nothing to do with a second order phase transition. The simplest mechanism is to consider 
multiplicative random walks with a reflecting boundary [29] . Long-range memory of volatility 
is well-reproduced in agent-based models whose agents act or do nothing depending on a 
criterion based on a random walk [6|. Assuming pre-existing power-law distributed wealth, 
an effective theory of market phenomenology links the distributions of price returns, volume, 
and trader wealth [23J. On the other hand, markets are able to produce power-law distributed 
price returns by simple mechanisms of limit order placement and removal without the need 
for wealth inequality [141 122] . However, in turn, one needs to explain why limit orders are 
placed in such manner; the heterogeneity of time scales may provide an explanation of order 
placement far away from best prices if power-law distributed [26] . but additional work is 
needed in order to explain order placement near best prices, which causes these large price 
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moves. Finally, a recent simple model of investment with leverage is able to reproduce some 
stylized facts [36J. 

But mechanisms alone may not be sufficient to replicate the full complexity of financial 
markets, as some part of it may lie instead in the heterogeneity of the agents themselves. 
While the need for heterogeneous agents in this context is intuitive (see e.g. [2]), there is no 
easily available data against which to test or to validate microscopically an agent-based model. 
Even if it is relatively easy to design agent-based models that reproduce some of the stylized 
facts of financial markets (see e.g j27J EH 13 E2 [[]), one never knows if this is achieved for 
good reasons, except for volatility clustering [6]: it is to be expected that real traders behave 
sometimes at odds with one's intuition. Thus, without data about the traders themselves, one 
is left with the often frustrating and time-consuming task of reverse-engineering the market in 
order to determine the good ingredients indirectly. Some progresses have been made recently 
with the analysis of transactions in Spanish stock market aggregated by brokers [37], hence 
with mesoscale resolution. 

Data on trader behavior is found in the files of brokers, usually shrouded in secrecy. But 
this lack of data accessibility is not entirely to blame for the current ignorance of real-trader 
dynamics: researchers, even when given access to broker data, have focused on trading gains 
and behavioral biases, often with factor-based analyses (see e.g. (SHU [20]). 

We aim at providing a coherent picture of how various types of traders behave and interact, 
making it possible for agent-based models to rest on a much more solid basis. This paper is the 
first of a series that will establish stylized facts about trader characteristics and behavior. One 
of the most important aspects of these papers will be to characterize the heterogeneity of the 
traders in all respects (account value, turnover, trading frequency, behavioral biases, etc.) and 
the relationships between these quantities in probability distribution, not with factors. This 
paper is first devoted to the description of the large data set that we use; it then focuses on 
the relationship between trader account value, turnover per transaction and transaction costs, 
both empirically and theoretically. We will show that while the traders have a spontaneous 
tendency to build equally-weighted portfolios, the number of stocks in a portfolio increases 
non-linearly with their account value, which we link to portfolio optimisation and broker 
transaction fee structure. 

2 Description of the data 

Our data are extracted from the database of the largest Swiss on-line broker, Swissquote Bank SA 
(further referred to as Swissquote). The sample contains comprehensive details about all the 
19 million electronic orders sent by 120 '000 professional and non-professional on-line traders 
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from January 2003 to March 2009. Of these orders, 65% have been canceled or have expired 
and 30% have been filled; the remaining 5% percent were still valid as of the 31st of March 
2009. Since this study focuses on turnover as a function of account value, we chose to exclude 
orders for products that allow traders to invest more than their account value, also called 
leveraging, i.e., orders to margin-calls markets such as the foreign exchange market (FOREX) 
and the derivative exchange EUREX. The resulting sample contains 50% of orders for deriva- 
tives, 40% for stocks, and 4% for bonds and funds. Finally, 70% of these orders were sent to 
the Swiss market, 20% to the German market and about 10% to the US market. 

Swissquote clients consist of three main groups: individuals, companies, and asset managers. 
Individual traders, also referred to as retail clients, are mainly non-professional traders acting 
for their own account. The accounts of companies are usually managed by individuals trading 
on behalf of a company and, as we shall see, behave very much like retail clients, albeit with a 
larger typical account value. Finally, asset managers manage accounts of individuals and/or 
companies, some of them dealing with more than a thousand clients; their behavior differ 
markedly from that of the other two categories of clients. 

3 Results 

3.1 Account values 

Numerous studies have been devoted to the analysis and modeling of wealth dynamics and 
distribution among a population (see [ID] and references therein). The general picture is 
that in a population, a very large majority lies in the exponential part of the reciprocal 
cumulative distribution function, while the wealth of the richest people is Pareto-distributed, 
i.e., according to a power-law. 

The account value of Swissquote traders is by definition the sum of all their assets (cash, 
stock, bonds, derivatives, funds, deposits), and denoted by P v . In order to simplify our 
analysis, we compute P v once per day after US markets close and take this value as a proxy 
for the next day's account value. Figure [T] displays this distribution computed at the time of 
the first and last transactions of the clients. Results are shown for the three main categories 
of clients. Maximum likelihood fits to the tail of the individual traders to the Pareto model 
p(x) ~ (x/a; m i n ) _7 were performed using the BC a bootstrap method of [21] and determining 
the parameter x m i n by minimizing the Kolmogorov-Smirnov statistics as in [16]. Results are 
reported in table [T] 

The values of 7 are in line with the wealth distribution of all major capitalistic countries 
(see |34| for a possible origin of Pareto exponents between 2.3 and 2.5). Thus the retail clients 
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Figure 1: Reciprocal cumulative distribution function of the portfolio value P v for the three cat- 
egories of clients at the time of their first (empty symbols) and last (filled symbols) transactions. 
Several models have been fitted to the data by Maximum Likelihood Estimation (MLE): the Student 
distribution (Pareto with plateau), the Weibull (stretched exponential), and the log-normal distribu- 
tion. The best candidate, determined graphically and via bootstrapping the Kolmogorov Smirnov test 
jlSf was found to be the log-normal distribution, which is the only one shown here for the sake of 
clarity. The dashed line in light blue results from a MLE ht to the tail of the individual traders with 



the Pareto distribution p(x) ~ {x/x m in) 7 (see section 3.1 ) 



Table 1: Results of the fits of Pareto law (x/xmin) 1 to the account value P v of individuals. 
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Table 2: Parameter values and 95% confidence intervals for the MLE fit of the account values to the 
log-normal distribution lniV(/i, er 2 ). For each category of investors, the first and second row correspond 
to the account value at the time of the first, respectively the last transaction (see text). Note that 
portfolio values have been multiplied by an arbitrary number for confidentiality reasons. This only 
affects the value of (x. 





A* 






(7 




individuals 


13.94 ± 


0.02 


2.87 


± 


0.01 




14.25 ± 


0.02 


2.01 


± 


0.01 


companies 


16.0 ± 


0.2 


2.0 


± 


0.1 




15.9 ± 


0.2 


2.4 


± 


0.1 


asset managers 


16.7 ± 


0.2 


1.8 


± 


0.1 




16.7 ± 


0.2 


2.0 


± 


0.1 



are most probably representative of the Swiss population. The account value distributions of 
companies and asset managers have no clear power-law tails, in agreement with the results 
of a recent model that suggests a log- normal distribution of mutual fund asset sizes |31j . 
Consequently, figure [l] also reports a fit of the data to log-normal distributions In N(fi, a 2 ), 
which approximate more faithfully P > (P V ) than the Student and the Weibull distributions 
for the three categories of clients, except its extreme tail in the case of retail clients. 



3.2 Mean turnover 



The turnover of a single transaction i, denoted by Tj is defined as the price paid times the 
volume of the transaction and does not include transaction fees. We have excluded the 
traders that have leveraged positions on stocks, hence Tj < P v ; more generally one wishes 
to determine how the average turnover of a given trader relates to his portfolio value. In 
passing, since P{P V ) has fat tails, the only way the distribution of T can avoid having fat 
tails is if the typical turnover is proportional to log(P„). We denote by (T) the mean turnover 
per transaction for a given client over the history of his activities. 

Figure [2] reports its reciprocal cumulative distributions functions (RCDF) for stocks and 
derivatives for the three categories of clients; all RCDFs have a first plateau and then a fat 
tail. For stocks, the tails are not a pure power laws, but they are for derivatives. Indeed, fitting 
the RCDFs with Weibull, log-normal and Zipf-Mandelbrot distribution with an exponential 
cut-off, defined as 



F£\x) = f € P " , (1) 



clearly shows that the latter is the only one that does not systematically underestimate the 
tail of the RCDF for stocks; estimated values of /3 and 7 given in table [3j 
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(a) Stocks 




3 4 5 

<T> [a.uj 



(b) Derivatives 



Figure 2: Reverse cumulative distribution function of the mean turnover per transaction for the 
three categories of clients, and for both stock and derivative transactions. In the insets, the tail part 
of the RCDF of (T norm ) = (T) / mean((T)) . The solid curves are maximum likelihood fits to |7|) for 
stocks and for derivatives. The dotted lines are fits to the Weibull distribution and the dashed 
lines to the log-normal distribution. 



The RCDFs related to the turnover of transactions on derivative products have clearer power- 
law tails for retail clients, which we fitted with a standard Zipf-Mandelbrot function, defined 
as 



(c + x)' 



(2) 



The parameters estimated are to be found in table [3j because of the power-law nature of this 
tail, fits with Weibull and log-normal distributions are not very good in the tails. While the 
decision process that allocates a budget to each type of product may be essentially the same, 
the buying power is larger for derivative products, which may explain the absence of a cut-off. 
Fits for companies and asset managers is very difficult and mostly non- conclusive because of 
unsufficient sample size; the good quality of the tail collapse (see inset) tends to indicate that 
the three distributions are identical, but we could not fit the RCDF of companies and asset 
managers with ([2]); as reported in figure [2)3, log- normal distributions are adequate choices in 
these cases; since the quality of the fits are poor, we do not report the resulting parameters. 

Table 3: Results of the maximum likelihood fit of P > ((T)) with and for the three categories 
of clients. The 95% confidence intervals reported in smaller character are computed by the biased- 
corrected accelerated (BC a ) bootstrap method of \21Jj . 
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companies 
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1.66 
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[0.44,2.3] 




asset managers 


1.93 


0.91 






[1.47,2.93] 


[-7.8,4.5] 
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individuals (stocks) # traders 




<log(P v )> 

Figure 3: Density plot of the average logT vs the average \ogP v , robust non-parametric fit (red 
line), and linear fits (dashed lines) 

3.3 Mean turnover vs account value 

The relationship between (T) vs (P v ) is important as it dictates what fraction of their in- 
vestable wealth the traders exchange in markets. We first produce a scatter plot of (log T) vs 
(log P v ) (figure [4]). In a log- log scale plot, it shows a cloud of points that is roughly increasing. 
A density plot is however clearer for retail clients as there are many more points (figure [3]) . 

These plots make it clear that there are simple relationships between logT and log TV A 
robust non-parametric regression method |17j reveals a double linear relationship between 
(logT) and (logP„) for all three categories of investors (see figures [4] and [3]) : 

(logT) =0 X (log P v ) + a x (3) 

where x = 1 when (logP v ) < Q± and x = 2 when (logP v ) > 02- Fitted values with confidence 
intervals are reported in table [5| 

This result is remarkable in two respects: (i) the double linear relation, not obvious to the 
naked eye, separates investors into two groups (ii) the ranges of values where the transition 
occurs is very similar across the three categories of traders. 

The relationships above only applies to averages over all the agents. This means that there 
are some intrinsic quantities that make all the agents deviate from this average line. Detailed 
examination of the regression residuals show that the latter are for the most part (i.e. more 
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Table 5: Parameter values and 95% confidence intervals for the double linear model |4|). For each 
category of investors, the first and second row correspond respectively to (logP„) < &i an (logP„) > 
02 • For confidentiality reasons, we have multiplied P v and T by a random number. This only affects 
the true values of a x and in the table. 







a x 
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R 2 


individuals 


0.84 ±0.02 
0.54 ±0.01 


0.73 ± 1.25 
5.07 ±0.15 


0.71 
0.77 


14 
14.5 


0.52 
0.40 


companies 


0.81 ±0.13 
0.50 ±0.07 


1.12 ±8.17 
5.82 ± 1.65 


0.88 
1.00 


15.5 
15.6 


0.47 
0.33 


asset managers 


0.89 ±0.20 
0.63 ±0.08 


-0.31 ±0.76 
3.28 ±5.78 


0.62 
0.62 


15.5 
16.5 


0.52 
0.46 
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than 95%) normally distributed with constant standard deviations £ x and that the residuals 
deviating from the normal distributions are not fat-tailed. This directly suggests the simple 
relation for individual traders 

rpi — e a.x+S'a x ^pi-^Px < e @x /£\ 



where T % and P£ are respectively the turnover and portfolio value of investor i, and 5 l a x are 
i.i.d. iV(0,^) idiosyncratic variations independent from P v that mirror the heterogeneity of 
the agents. As we shall see, portfolio optimization with heterogeneous parameters yields this 
precise relationship. 



3.4 Turnover rescaled by account value 



Let us now measure the typical fraction of wealth exchanged in a single transaction, defined 
as Q = ^p~V Since the inverse of this ratio is an indirect (and imperfect) proxy of the 
number N of assets that a trader owns, it also indicates how well diversified his investments 
are, hence, it can be viewed a simple proxy of the risk profiles of the agents. 



3.4.1 data 



Figure [5] shows that the distributions look exponential to a naked eye for about 90% of the 
individuals and nearly 80% of the companies, while that of the asset managers is rapidly 
more complex that a simple exponential. We derive exact relationships for this quantity in 
subsection |3.4.2 that show that these distributions are in fact not exponential but log- normal. 



The resulting picture is that only a small fraction of customers trade a large fraction of 
their wealth on average. Interestingly, these figures show a clear difference between the 
three categories of clients. As discussed above, figure [5] roughly reflects the risk profile of 
the different types of customers: less than 10% of asset managers trade on average more 
than 20% of their clients' capital in a single transaction; this rises to 30% for companies, 
and 45% for retail clients. Note however that despite the fact that the account values of 
companies and asset managers are comparable, companies tend to have a Q closer to that 
of the individuals; this suggests either that companies hold a smaller N than asset managers 
for the same account value, or that asset managers tend to make smaller adjustments to the 
quantities of assets. 
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(a) 




(b) 



Figure 5: Reverse cumulative distribution function ofQ = ( ~jr}> the mean ratio of the turnover over 

the portfolio value for individual traders (black), companies (red) and asset managers (green). Left 
plot is in lin-log scale and right plot is in log-lin scale. Solid lines come from theoretical predictions 



of section 3.4.2 



3.4.2 theory 



Since we know the distributions of T, P v and their relationship, we are in a position to 
derive analytical expressions for Q l = (^-p^^J of investor i. The distribution of Q across the 
population of on-line investors can be easily found using Q and the distribution of P v . Let 
Pr,P v (t,p v ) denote the joint distribution of T and P v : 



rco rco 
P Q=TL{q)=\ PvPT,P v (qPv,Pv)dp v = PvPT\P v (lPv\Pv)Pp v (Pv)dpv 

Pv Jo Jo 



(5) 



Let us now assume for the sake of clarity that T = e a+Sa Pi . Given P v , the turnover T follows 
a log-normal distribution with mean logp„ + a and variance £ 2 . Substituting P T i Piv (t\p v ) = 
InN (logp v + a,£ 2 ) in ftty leads after some simplifications to 



Pq{i) 



exp 



\og(qp v 



2£ 2 



Pp v (j>v)dp v , 



(6) 



and 
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F Q (q) = I" ' P Q {x)dx = I >fc I - "*™ v ' I P Pv (p v )dp v , (7) 



where erfc(x) = e~ y2 dy is the complementary error function. As expected, when j3 = 

(i.e. T and P v are independent), we recover the product of the two marginal distributions. On 
the other hand, when /3 = 1, i.e., when T is proportional to P v , Pg(q) = In iV (a,£ 2 ), which 
is the distribution of the factor e a+Sa . For other values of (3 the functions Pq and Fq cannot 
be determined analytically unless Pp v takes a particular form as shown below. However, the 
moments of Pq(q) can be arranged in a simpler form: 

POO f'OO -I 

E(q n ) = / q n P Q (q)dq = e na+ ~^ / -^P Pv (p v ) dp v , (8) 
J J 



that is, the (log-normal) moments of T/P v times an integral term smaller or equal to 1 
(because in practice Pp v (p v ) > lj^ Hence, the relation E(q n ) < e na +2 n2 Z 2 with equality 
when /3 = 1 holds for any distribution of the account value P v . 



In section 3.1, we have shown that the distribution of P v is well- approximated by a log- 
normal distribution. This particular choice of distribution makes the previous integrals 
analytically tractable. Indeed, with Pp v = In N([i, a 2 ) straight integration of ^ leads to 
P Q = In N(M, S 2 ), where M = a — (1 — /?)// and S 2 = £ 2 + (1 - f3) 2 a 2 . This simple result 
has some practical interest: given the distribution parameters and the coupling factor f3, one 
can draw realistic q factors for agent-based modeling as Q = e M+sx , where X is A^(0, 1) 
distributed. Furthermore, in the next section, we show how the value of /3 may be inferred 
from the transaction cost structure, which decreases the number of parameters to four. 

Figure [5] confirms the validity of the above theoretical results, once expanded to the case of 
a bi-linear relation between T and P v . It is noteworthy that the continuous lines are no fits 
on empirical q factors, but use instead the results of the separate fits on the turnover and 
account distributions. 



4 The influence of transaction costs on trading behavior: optimal mean-variance 
portfolios 

Apart from risk profiles, education, and typical wealth, the differences in the turnover as a 
function of wealth observed above between the three populations of traders may also lie in the 

1 Mathematically, all the moments of Q always exist since ft < 1 and P v (p v ) must decay faster than p^ 1 to 
be a valid distribution. 
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Figure 6: Swissquote fee curve for the Swiss stock market. Commissions based on a sliding scale of 
costs are common practice in the world of on-line finance. The red line results from a non-linear fit 
to equation 10 Parameter values are C = 0.13 € [0.05,0.5]g5 and S = 0.63 € [0.5, 0.74]g 5 , where the 



95% confidence intervals are obtained from the BC a bootstrap method of 



difference of their actual transaction cost structure. Swissquote current standard structure 
for the Swiss market (its shape is very similar for European and US markets) is shown in 
figure [6j it is a piece-wise constant, non- linear looking function. Fitting all segments to 
equation [Tu] gives 5 = 0.63 E [0.5, 0.74]g5. The fee structure of most brokers is not set in 
stone and can be negotiated. A frequent request is to have a flat fee, i.e. a fixed cost per 
transaction corresponding to a constant function. Since quite clearly the negotiation power of 
large clients or of clients that carry out many transactions is more important, asset managers 
are more likely to obtain a more favorable fee structure than basic retail clients. 

Since buying some shares of an asset is the result of unconscious or calculated portfolio 
construction process, one first needs a theoretical reference point with which to compare the 
population characteristics as measured in the previous subsection. In other words, we shall 
use results from portfolio optimization theory with non-linear transaction cost functions to 
understand the results of the previous subsection. 

Quite curiously, all analytical papers in the literature on optimal portfolios either neglect 
transaction costs or assume constant or linear transaction cost structures; non-linear struc- 
tures are tackled numerically; thus, we incorporate the specific non-linear transaction cost 
structure faced by the traders under investigation in the classic one-shot portfolio optimiza- 
tion problem studied by Brennan [8] , who restricted its discussion to fees proportional to the 
number of securities, in other words, a flat fee per transaction. 
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Building optimal mean- variance stock portfolios consists for a given agent in selecting which 
stock to invest in and in what proportion by maximizing the expected portfolio growth, 
usually called return, while trying to reduce the resulting a priori risk. One cost function 
that corresponds to such requirements is 



where R is the stochastic return of the portfolio over the investment horizon (e.g., one month, 
one year) and A tunes the trade-off between risk and return; as such, it can be interpreted 
as a measure of an investor's attitude towards risk: the larger A , the more risk-adverse the 
investor. 

The return of the portfolio can be decomposed into contributions from risky assets (stocks, 
derivatives, etc.), the interests of the amount kept in cash, and the total relative cost of broker 
commission, which we denote as R = R risk v + R cash _ R cost . Mathematically, 

• R risk v = Y,f =l XiRi, where Ri is the return of stock i over this horizon, X{ is the fraction 
of the total wealth invested in this stock, and N is the total number of investable assets; 
we shall denote the total fraction of wealth invested in risky assets by x = Y^iLi x i'i 

• R cash = (1 — x)r, where r is the interest rate; 

•yff Fix P ) 

m j^cost _ zw=i_v > v > (\ _j_ r ^ w here F(x) is the amount charged by a broker to exchange 
an amount x of cash into shares or vice- versa. 

The focus of this section is to derive explicit relationships between F, the number of assets 
to hold in a portfolio, and the account value P v . Whereas previous works only considered 
special cases for F that are not compatible with the fees structure of Swissquote, we need to 
introduce a cost function that can accommodate all the standard broker commission schemes. 
The two extreme cases are i) flat-fee per transaction, i.e., a fixed cost that does not depend on 
the amount exchanged ii) a proportional scheme, possibly with a maximum fee. Swissquote's 
standard scheme stands in between and is well approximated by a power-law with a maximum 
fee F max . We hence choose 



where 5 interpolates between a flat-fee (S = 0), as in [8], and a proportional scheme (<5 = 1) 
via a power-law, and C is a constant. 



L\(R) = \E(R)-Var(R) 



0) 




(10) 



Following the well-known one-factor model of Sharpe [32] , we assume that the return of asset 
i follows the global market's return Rm with an idiosyncratic proportionality factor fy. More 
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specifically, 

Ri = Pi(R M - r) + r + £i, (11) 

where is an uncorrelated white noise E(ei) = E{siEj) = E(Rm£i) = 0. This equation 
means that the systematic idiosyncratic part of Ri only applies to the return above the 
risk-free interest rate, also called market risk premium. 

This completely specifies the functional L\. Returning to Q, one first computes the expec- 
tation and variance of the portfolio return: 



E(R) = + (1 ~ x > ~ p (1 + 0, 

1=1 



(E(R M ) -r)J2 Xifr + r - £ 4, (12) 



i=i rv i=i 



and 



Var(i?) = Var(R risky ) 

N N 

= Var^M^C^Af + ^fVar^). (13) 

i=l i=l 

Note that, since here the risk-free rate is non-random, the portfolio variance is independent 
of both the risk-free investment and broker commission; this does not hold for the expected 
return. 

In principle, the functional L depends on N, the number of assets in the portfolio, A the 
risk parameter, and x% the fraction of account value to invest in risky product i. Assuming 
that Xi is constant for all i (i.e. equally- weighted allocation), we are left with only three 
parameters since Xi = x/N . Thus, from the optimization of the resulting functional one can 
obtain a relationship between any two of these parameters. We are mostly interested in iV 
as a function of x. 



4.1 Non-linear relationship between account value and number of assets 



We will first assume that agents seek the optimal fraction of their account value x* to invest 
in N securities — N being known — given the risk free rate r and broker commission F(xiW). 
The optimal solution is simply obtained by setting X{ = x/N in ( 12 ) and ( 13 ), and by equating 
to zero the derivative of Q with respect to x. This leads to the following transcendental 
equation for x*: 

* X l3(E(RM)-r)-5(l + r)C(^- s 
- - — i— , ( 14 ) 



f3 2 Var(R M ) + iVar(e 
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where /3 = YliLi ft an d Var(e) = -h YliLi Var(ej) is the mean idiosyncratic volatility. 
Provided the investor risk tolerance A has been reliably estimated, which is usually a complex 



task [39] , and that Sharpe model is adequate, ( 14 ) can be used directly in a real- world portfolio 
optimization problem. The /3, and £j are then obtained by regressing the returns of all the 



stocks with (11); the optimal solution is expected to be reliable in the absence of significant 



residual correlations between £j and £,-. In the more common situation where A is unknown, 
one can derive a second equation for the optimal number of securities under the assumption 
that portfolios are sufficiently homogeneous, or that the investment horizon is long enough 
so as to have /3 and Var(e) independent from N. As shown in figure [TJ j3 on the US stock 
market is persistently close to one for various time horizons and values of N, consistently 
with the homogeneous assumption. Taking a few technical precautions into account ([8]), the 
differentiation of the Lagrangian Q with respect to N leads to 



A 



Var(e)Pj 



1-5 



(l-<5)C(l + r) 



' jV* 



,2-5' 



(15) 



where it is assumed that 5 < 1 since for 6 = 1 the optimum investment does not depend on 



N through the cost function. According to (15), the agent risk tolerance increases with their 



account value P v , in agreement with various survey studies on the risk tolerance of actual 



investors (see the literature review of [38]). Using (14) and (15) to get rid of A, we obtain 



N 2 ~ s | 1 + 



5 K 



K mR ^r r \ fen)"*. 



1-6 NJ (l-6)C(l + r) 
where K is the ratio of residual risk to market risk defined as 



(16) 



^ 2 Var(E M ) 1 
Var(e) N 



-i 



Var(e) 



iV>l /3 2 Var(i? 



MS 



(17) 



Given the desired level of systematic risk x, ( 16 ) can be solved for N numerically in an actual 



portfolio optimization. Further insight is gained by considering the high diversification limit 
N > 1, which yields 1 + 



1 in ( 16 ) and thus 



N 



K 



P{E(R 



M 



l-5)C(l + r) 



(18) 



where K is given by the right-hand side of ([17]) . The latter equation generalizes [8] to the 
case of a varying cost impact represented here by the parameter 5 (i.e. the result of [8] is 
recovered by setting 5 = and = 1 in (18)). These results can be further generalized to 



non-equally weighted portfolios by differentiating ^ with respect to xi and assuming again 
an homogeneous condition for the /3jS. 



In essence, (18) says that the number of securities held in an equally-weighted mean-variance 
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portfolio with Sharpe-like returns is related to the amount invested as 



log(iV) = J— f l og(xP v ) + re (19) 



1-6 

in the high diversification limit, where re is the pre-factor of (xP v ) 2 ~ s in (18). The last equa- 



tion gives N as a function of P v for a predefined x in the optimal portfolio. The heterogeneity 
of the traders, beyond their account value, is not apparent yet, but may occur both in x and 
re: first each trader may have his own preference regarding the fraction of this account to 
invest in risky assets, x; therefore one should replace x by x l ; next, re includes both a term 
related to transaction costs, which does vary from trader to trader, and some measures and 
expectation of market returns and variance; each trader may have his own perception or way 
of measuring them, hence re should also be replaced by re 1 . Finally, both terms can be merged 
in the same constant term Q = fc^f log(a;*) + re*. This explains how the heterogeneity of the 
traders is the cause of fluctuations in the kind of relationships we are interested in. 



5 Turnover, number of assets and account value 



The result above only links N with P v , but one also wishes to obtain relationships that involve 
the turnover per transaction, T. Whereas in section [3j we have characterized the turnover 
of any transaction, the results of section [4] rest on the assumption that the agents build 
their portfolio by selecting a group of assets and stick to them over a period of time. This, 
obviously, does not include the possibility of speculating by a series of buy and sell trades 
on even a single asset, nor portfolio rebalancing which consists in adjusting the relative 
proportions of some assets. We thus have to find a way to differentiate between portfolio 
building, rebalancing and speculation. Here, we shall focus on portfolio building in order to 
test and link the results of section [4] to those of section [3l 

We have found a simple effective method that can separate portfolio-building transactions 
from the other ones: we assume that the transactions of trader i that correspond to the 
building of his portfolio are restricted the first transaction of assets not traded previously; 
sell orders are ignored, since Swissquote clients cannot short sell easily. In other words, if 
trader i owns some shares of assets A, B, and C and then buys some shares of asset D, the 
corresponding transaction is deemed to contribute to his portfolio building process; the set of 
such transactions is denoted by 3>j, while the full set of transactions is denoted by Any 
subsequent transaction of shares of assets A, B, C, or D are left out of $j. The number of 
different assets that trader i owns is supposed to be Ni ~ |$j| where \X\ is the cardinal of 
set X; this approach assumes that a trader always owns shares in all the assets ever traded; 
surprisingly, this is by large the most common case. We shall drop the index i from now on. 
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Figure 7: Box-plot of empirical /3s obtained from the regression of several US stocks on the S&P500. 
The observation period covers 2001 to 2008 and returns are computed on various time horizons At 
(in days). Results show that P = jj Y^i=i Pi ~ 1 f° r a ^ values of At and (even small) N, consistently 
with the homogeneous assumption of section 



4.1 
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asset managers (stocks) 

retail clients (stocks) companies (stocks) t^^—^—^—^-^—^—^—^- 




Figure 8: Turnover of transactions contributing to the building of a portfolio 7$ versus the number 
N of assets held by a given trader at the time of the transaction. Green lines: non-parametric fit; red 
lines: fits of the linear part of the non-parametric fit. From left to right: companies, asset managers, 
and individuals. 



Let us now focus on T<j> = X^fce* ^ ne total turnover that helped building his portfolio. We 
should first check how it is related to the total portfolio value P v . Let us define {P v )$, the ac- 
count value of a trader averaged at the times at which he trades a new asset. Plotting log (P v )$ 
against log T$ gives a cloudy relationship, as usual, but the fitting it with log {P v )$ = X l°g 
gives x = 1-03 ± 0.02 for individuals, x = 0.99 i 0.02 for asset managers and x = 1-00 =t 0-01 
for companies with an adjusted R 2 = 0.99 in all cases. This relationship trivially holds for 
the traders who buy all their assets at once, as assumed in the portfolio model. The traders 
who do not lie on this line either hold positions in cash (in which case this line is a lower 
bound), or do not build their portfolio in a single day: they pile up positions in derivative 
products or stocks whose price fluctuations are the origin of the devations from the line. But 
the fact that the slope is close to 1 means that the average fluctuation is zero, hence, that on 
average trades do not make money from the positions taken on new stocks. The consequence 
of this is that log-Py can be replaced by logT$ in (19), thus, setting x = 1, 



logiV= L^^g^ +K (20) 
2 — o 

The x = 1 assumption is in fact quite reasonable: most Swissquote traders do not use their 
trading account as savings accounts and are fully invested; we do not know what amount 
they keep on their other bank accounts. 

A robust non-parametric fit does reveal a linear relationship between logiV and logT$ in a 
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Table 6: Slope a linking log 7$ and \ogN for the three trader categories. 





individuals 


companies 


asset managers 


a: 


0.52 ±0.02 


0.36 ±0.14 


0.44 ±0.13 


logT$ G 


[16, 19] 


[17, 19.8] 


[15.8,18] 



Table 7: Results of the double linear regression of log (T) $ versus \og(P v )$. For each category 
of investors, the first and second row correspond respectively to log (P v )q, < @i an log (P v }$, > 62, 



where Oi 2 -have been determined graphically using the non-parametric method of \17] as in section 3.3 
Parameters are as in the double linear model For conhdentiality reasons, we have multiplied P v 
and T by a random number, which only affects the true values of 61,2 and of the ordinate a 





At 






a x 






e 


R 2 


individuals 


0.85 ± 


0.02 


0.71 


± 


0.16 


0.65 


14.5 


0.59 




0.51 ± 


0.01 


5.62 


± 


0.17 


0.76 


15 


0.31 


companies 


0.83 ± 


0.17 


1.03 


± 


2.47 


0.86 


15.5 


0.42 




0.62 ± 


0.14 


3.99 


± 


2.55 


0.93 


17 


0.32 


asset managers 


0.84 ± 


0.25 


0.45 


± 


3.77 


0.79 


15.95 


0.50 




0.73 ± 


0.17 


1.72 


± 


3.23 


0.72 


18 


0.41 



given region (JV, T$) G T (figure [8j. In this region, we have 

logiV = alogr$±/3, (21) 

which gives 

1-5 

a = — s - (22 > 

We still need to link (T)^ and (P v )^. While section 3 showed that the unconditional averages 
lead to (T) ~ (P v )^ , one also finds that (T)§ ~ (Pv)$- Therefore, one can write 

log(T) $ = /31og(P,) $ ±cst. (23) 



Thus, one is finally rewarded with the missing link 



» = ^re (24) 



which directly involves the transaction cost structure in the relationship between turnover 
and portfolio value, as argued in section 2r] This relationship allows us to close the loop 
as we are now able to relate directly the exponents linking T, N, and P v . Going back to 



2 Note that this relationship can be obtained directly by assuming that all the transactions happen at the 
same time, hence that T = (xP v )/N, which leads straightforwardly to (24 1. 
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section [3| one understands that the existence of a bi-linear relationship between log-turnover 
and log-account value, i.e., of two values of f3 for each of the three categories of clients, is 
linked to two values of 5: a flat flee structure or the disregard for transaction costs leads to 
/3 = |, while proportional fees (5 = 1) give (3 = 1. 

Let us finally discuss the empirical values of a, [3, and 5 against their theoretical counterparts, 
which is summarized in table [U 

1 . Small values of T<j> : it was impossible to measure a in that case since the non-parametric 
fit shows a non-linear relationship in the log-log plot for retail clients, which we trust 
more since they have many many more points than the graphs for the two other cate- 
gories of clients. But it may not make sense to expect a linear relationship since such a 
relationship is only expected for N large enough (N > 10 in practice) and a small T$ 
is related to a small N. Thus we can only test (3 = 1/(2 — 5). The reported value of (3 
is consistent accross all the clients. Retail clients have a larger S e ff = 2 — i that the 
estimated 5sq- Since the shape of the fee structure is discontinuous, the values of these 
exponents can hardly be expected to match. However, fitting the whole curve structure 
may be problematic in this context: indeed, the traders with a typical small value of 
T$ see a more linear relationship in the region of small transaction value that when 
considering the whole curve; for instance, removing the two largest segments from the 
fee structure yields S' S q = 0.74 G [0.43,0.79], which is not far of S e ff. 

2. Large values of T$: the relationships between all the exponents are verified for the three 
categories of clients. While not very impressive for companies and asset managers, this 
result is much stronger in the case of retail clients since the relative uncertainties associ- 
ated with each measured exponent are small (1-2%). The value of f3 re ±ail is of particular 
interest as it corresponds 5 e ff = 0, or equivalently, to a flat fee structure. Going back 
to the fees structure of Swissquote, one finds that that the transition happens when the 
relative transaction cost falls below some threshold (we cannot give its precise value 
for confidentiality reasons; it is smaller than 1%). A possible explanation is that either 
some traders with a high enough average turnover have a flat-fee agreement with Swis- 
squote and that the rest of them simply act as if they were not able to take correctly 
into account transaction costs. Since not all traders have a flat-fee aggrement, one must 
conclude that some traders have indeed some problems estimating small relative fees 
and simply disregard them. The reported value of (3 for companies and asset managers 
is larger that f3 re t a iu but it is more likely than not that the small sample size is respon- 
sible for this discrepancy, since these two categories of clients have a greater propensity 
to negociate a flat-fee structure. 

3. Transition between the two regimes: the transitions between the standard Swissquote 
and an effective flat-fee structure happens occur at the same average value of T for the 
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Table 8: Table summarising the empirical and theoretical relationships between a, j3, and 5. 



small T(f, 


individuals 


companies 


asset managers 


loj 


IT* < 


0.85 ±0.02 
14.5 


0.83 ±0.17 
17 


0.84 ±0.25 
18 




- 2- i 


0.82 ±0.02 


0.80 ±0.20 


0.81 ±0.30 




S SQ 


0.63 G [0.50,0.74] 
0.74 G [0.43,0.79] 


0.63 G [0.50,0.74] 
0.74 G [0.43,0.79] 


0.63 G [0.50,0.74] 
0.74 G [0.43,0.79] 


p = 


1 

2 — &SQ 


0.73 G [0.66,0.74] 


0.73 G [0.66,0.74] 


0.73 G [0.66,0.74] 
















lar. 


ge 


individuals 


companies asset managers 




log 


T$ > 


0.51 ±0.01 
15 


0.62 ±0.14 
17 


0.73 ±0.17 
18 




5 eff : 


- 2- i 


0.04 ±0.02 


0.39 ±0.23 


0.63 ±0.23 




a eff z 




0.49 ±0.01 


0.38 ±0.09 


0.27 ±0.08 




log 


a 

T$ G 


0.52 ±0.02 
[16, 19] 


0.36 ±0.14 
[17,19.8] 


0.44 ±0.13 
[15.8,18] 



three categories of traders (idem for T$). Since there is no automatic switching between 
fee structures at Swissquote for any predefined value of transaction value, one is lead 
to conclude that this transition has behavioural origins, which is also responsible for 
the value at which the transition takes place which, in passing, corresponds to the end 
of the plateau of the RCDF of P v in the case of retail clients (e 15 ~ 3.27 • 10 6 ). As a 
consequence, it is likely that the traders tend to either neglect or consider as constant 
transaction fees smaller than some threshold when they build their portfolio. 



6 Discussion and outlook 



We have been able to determine empirically a bilinear relationship between the average log- 
turnover and the average log-account value and have argued that it comes from the transaction 
fee structure of the broker and its perception by the agents. A theoretical derivation of 
optimal simple one-shot mean- variance portfolios with non-linear transaction costs predicted 
relationships between turnover, number of different asset in the portfolio and log-account 
values that could be verified empirically. This means that the populations of traders do take 
correctly on average, i.e. collectively, the transaction costs into account and act collectively 
as mean-variance equally-weighted portfolio optimizers. This is not to say that each trader 
is a mean-variance optimizer, but that the population taken as a whole behaves as such — 
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with differences across populations, as discussed in the previous section. This to be related 
to findings of Kirman's famous work on demand and offer average curves in Marseille's fish 
market |24| and more generally as what has become known as the wisdom of the crowds 
(see [35] for an easy-to-read account). 

The fact that the turnover depends in a non-linear way on the account value implies that 
linking the exponents of the distributions of transaction volume, buying power of large players 
in financial markets, and price return is more complex that previously thought [23]. It has 
also implications for agent-based models, which from now on must take into account the fact 
that the real traders do invest into a number of assets that depends non-linearly on their 
wealth. 

Future research will address the relationship between account value and trading frequency, 
which is of utmost importance to understand if the many small trades of small investors have 
a comparable influence on financial market than those of institutional investors. This will 
give an understanding of whom provides liquidity and what all the non-linear relationships 
found above mean in this respect. This is also crucial in agent-based models, in which one 
often imposes such relationship by hand, arbitrarily; reversely, one will be able to validate 
evolutionary mechanisms of agent-based model according to the relationship between trading 
frequency, turnover, number of assets and account value they achieve in their steady state. 
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